Skip to content

Conversation

@cmac86
Copy link
Member

@cmac86 cmac86 commented Feb 6, 2026

Summary

  • Adds deterministic short-term memory with three storage mechanisms: auto-store from tool memory_hint, explicit memory_short tool (store/get/delete/list), and HTTP API
  • File-based JSON persistence on Docker caal-memory volume with 7-day default TTL
  • Memory context injected as LLM awareness hint (after first user message) to enable tool chaining — e.g. "is my flight on time?" triggers memory_short(get) → flight_tracker
  • Frontend Memory Panel (Brain icon) with entry list, source badges (tool/voice/api), inline edit, and clear all
  • i18n translations for en, fr, it
  • Fixes Docker permission issues for /app/registry_cache.json and memory persistence

Architecture

src/caal/memory/          # Package (future-proofed for long-term memory)
├── base.py               # Shared types (MemoryEntry, MemorySource, MemoryStore)
├── short_term.py          # ShortTermMemory singleton, file persistence, TTL
└── __init__.py

src/caal/integrations/memory_tool.py  # MemoryTools mixin (memory_short function_tool)

Three storage paths:

  1. Tool hint — n8n workflows return memory_hint in response → auto-stored
  2. Explicit tool — user says "remember my flight is UA1234" → memory_short(store)
  3. HTTP APIPOST /memory for external systems

Context injection serves as awareness layer — the LLM sees what's in memory so it knows to chain tools (e.g. pull email from memory → send via Gmail), but retrieval still goes through the tool for verification.

Test plan

  • API: store, get, list, delete, clear via curl
  • Explicit tool: "remember my flight number is UA1234" → stored and retrievable
  • Tool chaining: "is my flight on time?" → memory_short(get) → web_search
  • Cross-tool chaining: "email Ashley" → memory_short(get) → gmail(send)
  • Clean greeting: memory not announced on session start
  • Frontend panel: entries display with source badges, timestamps, TTL
  • Inline edit: pencil icon → textarea → save
  • Docker persistence: entries survive container restart via /app/data volume

🤖 Generated with Claude Code

cmac86 and others added 10 commits February 3, 2026 14:16
Adds deterministic short-term memory with three storage mechanisms:
- Auto-store from tool responses via memory_hint field
- Explicit memory_short tool (store/get/delete/list actions)
- HTTP API endpoints for external access

Backend: src/caal/memory/ package with file-based JSON persistence,
singleton pattern, TTL support, and context injection into LLM.

Frontend: Memory Panel UI with Brain icon button, entry list,
detail modal, and clear all functionality.

Includes i18n translations for en, fr, it.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change default TTL from 24h to 7 days (604800s)
- Allow tools to specify custom TTL in memory_hint:
  - Simple value: uses default 7d TTL
  - {"value": ..., "ttl": seconds}: custom TTL
  - {"value": ..., "ttl": null}: no expiry

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace linear execute→stream→retry with a loop that supports
multi-step tool chaining. Model can now: call tool A → get result →
call tool B → get result → generate text response.

Previously, after one tool execution the code tried to stream a text
response. If the model wanted to chain (call another tool), it
produced 0 text chunks, triggering a retry without tools that crashed
Ollama (tool references in messages but no tools registered).

New flow:
- Loop non-streaming chat() calls (max 5 rounds)
- Each round: if tool_calls → execute → loop back
- When no tool_calls → yield content or stream final response
- Safety fallback: _strip_tool_messages converts tool messages to
  plain text if Ollama still crashes on the streaming path

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…o context

- Deduplicate identical tool calls within a single round (same name + args)
- Accumulate tool names/params across chained rounds for frontend indicator
- Keep tool indicator showing after response (don't clear when tools were used)
- Include tool call arguments in ToolDataCache context injection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Memory file was failing with permission denied because /app is
owned by root. Now uses CAAL_MEMORY_DIR=/app/data (the caal-memory
volume) and entrypoint ensures directory is writable by agent user.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents the LLM from using memory data in the initial greeting.
Memory context is now skipped when there are no user messages yet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…haining

Context injection helps the LLM know what's in memory so it can
chain tools correctly (e.g. memory_short → flight_tracker). Without
it, the model may skip memory and go to other tools directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…missions

- Memory detail modal now has pencil icon to edit values in-place
- Add registry_cache.json symlink to entrypoint.sh (same pattern as
  settings.json) to fix permission denied on /app/registry_cache.json

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ministral-3 recommended instruction temperature is 0.15. The old 0.7
default was overriding the Modelfile setting on every API call.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Sophist-UK
Copy link

Sophist-UK commented Feb 9, 2026

You have probably thought about this now thoroughly that I have but...

  1. Don't we need several types of memory? Short term for day to day transient data - like package numbers? Task memory for repeating cron tasks? Skills memory for new repeatable skills i.e. saved context for a shorter way to repeat a command? Knowledge - when Caal researched something, shouldn't it retain this knowledge? Isn't this how openclaw becomes so useful?

  2. What technologies should be used for memory? Redis? Vector db? MCP?

  3. AI hallucinations are partly due to lack of knowledge, but also due to false knowledge. What do we need to ensure that the memory is correct and factual?

  4. Forgetting - in the long run forgetting is probably as important as remembering otherwise Caal is likely to grind to a halt. Some memories are only needed for a predefined period. Some are personal and unique, some are learned off the internet and can be forgotten and then relearned. Some are accurate when they are remembered but the world changes around them and they become inaccurate. Each memory probably needs metadata so that Caal knows when to forget.

  5. Context - when I want to remember something I don't want my brain to be flooded with every memory however irrelevant - but equally I don't want to be starved of memory because I can't give it a precise index id. So memory needs to be fuzzy contextual. And some context can be precise i.e. if I am asking about a delivery, you only want details of all recent packages not my entire history - but others might be more general (like "I asked you about something to do with travel last week" when the history includes taxis, flights, hotels, reviewing car models, reviewing holiday destinations etc. and you got it wrong because it was 2.5 weeks ago).

IMO memory is likely to be the difference between Siri and openclaw i.e. limited Vs limitless.

In other words, it's complicated. But there has to be a lot of research on this, so probably not necessary to reinvent the wheel.

@cmac86
Copy link
Member Author

cmac86 commented Feb 9, 2026

@Sophist-UK, Great comment and you're touching on something I’ve been thinking about a lot.

You're right that memory has layers. Short-term is step one - this PR covers transient data like flight numbers, package tracking, things that are useful for a few days and then expire. TTL-based, simple, predictable.

Long-term memory is planned as well. Thinking graph-based (something like Graphiti) with embeddings for contextual retrieval. This is where "Corey prefers morning flights" or relationships between contacts and preferences would live. Your points about forgetting and fuzzy contextual search are spot on for that layer - metadata-driven expiry and hybrid search (semantic + keyword) are likely where that heads. Trick is to retrieve that information when necessary and inject it.

Where CAAL's approach differs from what you might be picturing is the role memory plays. In CAAL's architecture, the LLM is a router - it decides which tool to call and with what parameters, and then deterministic n8n workflows execute. Memory serves that routing. When you say "is my flight on time?" the model needs to know which flight so it can call the right tool with the right parameters. It's not accumulating capability or learning new skills - memory serves as data to get enough context to make better routing decisions.

Skills in CAAL are n8n workflows. They go through review (automated + human) before they're live. The model can build new workflows (we showed this in a previous video) but it has to be prompted to do so, and the method is through calling another workflow that uses a larger LLM to generate the workflow. That boundary is intentional - it's what lets an 8B model be reliable and secure. The model doesn't need to be smart enough to self-improve, it needs to be smart enough to route.

So to your five points: 1 and 2 - yes, layered memory with graph + embeddings is on the roadmap. 3 - agreed, and scoping what the model can do with memory (route, not execute) helps bound that risk. 4 - absolutely, TTL is built into this PR and long-term will need smarter expiry. 5 - contextual retrieval is key for the long-term layer.

Appreciate the thoughtful input. This is exactly the kind of discussion that helps shape the architecture.

Any experience with Graphiti or similar?

cmac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants